23? 09/623068 

534 ReWPCT/PTG 2 6 AUG 2000 

1/2 L tter requesting Amendments t PCT Applicati n PCT/US99/04376 up n 30 m nth 
entry into the U.S. Nati nal Stage (August 26, 2000)" 

1 (The amendments do not add new matter to the specification. Claims are being cancelled.) 
2 

3 The applicants hereby request the following amendments to the application upon entry into the 

4 U.S. National Stage: 
5 

6 Amendments to the background of the application: 

7 1) page 5 line 22 after territory" insert -and that it was difficult to predict the power of using a 

8 less dense map at that time- 

9 2) page 5 line 22 after " 10 " insert ~ The inventor's work, however, is a predictor of the power 

1 0 and success of a less dense map.- 

1 1 A replacement page 5 with header "PCT/US99/04376(U.S. National Stage Entry Aug. 2000)" is 

12 enclosed to effect these amendments 1) and 2) to the background. 

1 3 3) page 6 line 22 after TDT," change the text "to increase the likelihood of conditions 

1 4 occurring that increase the power of the TDT in the linkage study, the bi-allelic markers 

1 5 used in the study are chosen so that the least common allele frequencies of the markers 

1 6 vary systematically over a range or subrange of least common allele frequency. 11 from bold 

17 face italics to regular italics with underlining. A replacement page 6 with header 

18 a PCT/US99/04376(U.S. National Stage Entry Aug. 2000)" is enclosed to effect amendment 3) to 

19 the background. 

20 4) page 7 line 6 after TDT," change the text "to increase the likelihood of both criteria (1) and 

21 (2) occurring for one or more markers, so as to increase the power of the TDT in the 

22 linkage study, the bi-allelic markers used in the study are chosen so that the least 

23 common allele frequencies of the markers vary systematically over a range or subrange of 

24 least common allele frequency AND the chromosomal location of the markers vary 

25 systematically over one or more chromosomes or chromosomal regions. And the bi-allelic 

26 markers are chosen so that the markers 9 chromosomal locations and least common allele 

27 frequencies vary systematically in an essentially independent manner* from bold face italics 

28 to regular italics with underlining. 

29 5) page 7 line 32 delete the text in brackets [In addition, the two-dimensional linkage study 

30 techniques do not necessarily favor using markers in a scan that are about evenly spaced 

31 along a chromosome as in the conventional techniques. This is because ]. On page 7 line 31 

32 after the text "unfavorably" insert on the insert -Conventional techniques use a one- 

33 dimensional concept of "closeness". TTiese techniques space markers about evenly along 

34 a chromosome in the hope that some markers will be "close" (on the chromosome) to the 

35 sought gene. (They also favor bi-allelic markers with least common allele frequencies near 

36 0.5.) These--. 

37 A replacement page 7 with header U PCT/US99/04376(U.S. National Stage Entry Aug. 2000)" is 

38 enclosed to effect amendments 4) and 5) to the background. 

39 6) page 8 line 18 insert on the next line after "background" - Summary 



40 Versions of the invention use a new, two-dimensional concept of "closeness" for 

41 association-based linkage studies. Versions of the invention use bi-allelic markers that "cover" 

42 or are distributed approximately evenly (or systematically) over two-dimensional regions. These 

43 regions have the two dimensions of chromosomal location and least common allele frequency. 

44 Conventional techniques suffer from a kind of one-dimensional lack of depth perception. 

45 (They also favor bi-allelic markers with least common allele frequencies near 0.5.) Two- 

46 dimensional linkage study techniques overcome this lack of depth perception. These two- 

47 dimensional techniques greatly increase the chance that one or more markers used in a study will 

48 be close to the sought gene in two-dimensions. This results in more powerful, systematic and 

49 efficient methods (including computer programs) and machines for finding genes, such as harmful 



2/2 Letter requesting Am ndments to PCT Applicati n PCT/US99/04376 up n 30 m nth 



ntrv int the U.S. National Stage (August 26. 2000)" 



1 genes and genes of only modest effect. These techniques also use less dense (more efficient) 

2 marker maps (or marker "coverings"). 

3 The basic principles behind the two-dimensional approach spawn numerous other inventions. 

4 These include methods, machines and compositions of matter (groups of molecules) used for 

5 gathering the data (i.e. genotype/sample allele frequency data) used in the new two-dimensional 

6 studies, and computer techniques for using and handling such data. These techniques work for 

7 creatures other than human beings. And they work for markers and genes that are not bi-allelic 

8 (any marker or gene can be mathematically transformed to behave like it is bi-allelic). This 

9 summary is not exhaustive or limiting, there are other inventions not listed or specifically 

10 described here.-- 

1 1 A replacement page 8 with header "PCT/US99/04376(U.S. National Stage Entry Aug. 2000)* is 

1 2 enclosed to effect amendment 6) to the background. 
13 

14 Amendments to the Description 

1 5 7) page 38 line 2, page 38 line 1 7 and page 38 line line 20 delete the text "Best Mode" and 

16 replace the text "Best Mode" with the text "Set/Subset Example". A replacement page 38 with 

1 7 header "PCT/US99/04376(U.S. National Stage Entry Aug. 2000)" is enclosed to effect the 

1 8 amendments to the description under item 7). 

1 9 8) page 43 line 4 delete the text "Best Mode" and replace the text "Best Mode" with the text 

20 "Set/Subset Example". A replacement page 43 with header TCT/US99/04376(U.S. National 

21 Stage Entry Aug. 2000)" is enclosed to effect the amendment to the description under item 8). 

22 9) page 46 line 24 and line 28 delete the text "Best Mode" and replace the text "Best Mode" with 

23 the text "Set/Subset Example". One page 46 lines 26 and 27 delete the text "Best Mode" and 

24 replace the text "Best Mode" with the text "Set/Subset". A replacement page 46 with header 

25 "PCT/US99/04376(U.S. National Stage Entry Aug.;2000)" is enclosed to effect the amendments 

26 to the description under item 9). 
27 

28 Canceling of Claims and presentation of uncancelled claims for examination 

29 The applicants hereby request that all claims in the application be cancelled except for the 

30 following claims that were filed April 17, 2000: Claims 3, 4, 5, 7, 8, 20, 21, 22, 23, 33, 34, 35, 37, 

31 38, 50, 51 , 52, 53, 54, 57. Thus the applicants request that only claims 3, 4, 5, 7, 8, 20, 21 , 22, 

32 23, 33, 34, 35, 37, 38, 50, 51, 52, 53, 54, 57 filed April 17 2000 be examined. 
33 

34 I hereby attest that no new matter is added to the specification of the application by the 

35 amendments requested in the two pages of this letter. 
36 

37 Respectfully submitted, 
38 



41 Robert McGinnis 

42 U.S. Patent Agent 44, 232 



PCT/US99/04376(U.S.^ational Stage Entry Aug. 2000) 



5 

1 0.5/0.5. Secondly, bi-allelic markers with lower least common allele frequencies, less than 0.3(0.7/0.3) 

2 or 0.2(0.8/0.2), are viewed unfavorably for linkage studies in this reference. Thirdly, the early version of 

3 the criterion of "information content" of markers used in this reference was based on sib pair analysis 

4 and the later, current version of the criterion, does not depend on any particular test for linkage. 5, 6 

5 Thus, the criterion of information content in this reference, has never specifically employed th 

6 TDT (transmission disequilibrium test) or any association based test, whereas the two- 

7 dimensional linkage study techniques of this application are based on a completely different 

8 perspective of using association based tests. (This reference 4 is not admitted to be prior art with 

9 respect to the present invention by it's mention in this background.) 

10 Increased Power of the TDT (transmission disequilibrium test) 

1 1 Characteristics of a new type of linkage test, the TDT (transmission disequilibrium test), were described 

12 in 1993. The inventor, R.E.McGinnis, was one of the authors of this reference. 7 In 1996, Risch and 

13 Merikangas argued that conventional linkage analysis has limited power to detect genes of modest 

14 effect. And Risch and Merikangas attempted to illustrate the increased power of association based 

15 linkage tests such as the TDT over other types of conventional linkage tests. 8 However, Risch and 

16 Merikangas 1 analysis was criticized by Muller-Myhsok and Abel as being based on the optimal 

17 assumption that the analyzed allele was the disease allele itself. Muller-Myhsok and Abel concluded 

18 that researchers should be aware that the power of association studies such as the TDT can be greatly 

19 diminished in more common, less optimal situations. 9 In their response to Muller-Myshok and Abels' 

20 letter, Risch and Merikangas essentially agreed with the logic of Muller-Myshok and Abels' criticism. 

21 Risch and Merikangas stated that to a large extent, the expectation with respect to linkage 

22 disequilibrium across the genome is uncharted territory and that it was difficult to predict the power of 

23 using a less dense map at that time. 10 The inventor's work, however, is a predictor of the power 

24 and success of a less dense map. (None of the references in this paragraph 7 8, 910 is admitted to 

25 being prior art with respect to the present invention by their mention in this background.) 

26 More Detailed Studies of the Power of the TDT 

27 The inventor, R.E.McGinnis, has done extensive investigations on the power of the TDT. His 

28 observations and calculations of the increased power of the TDT in many situations have been 



5 Kruglyak, et. al.: Complete Multipoint Sib-Pair Analysis of Qualitative and Quantitative Traits. Am J 
Hum Genet, 1995, vol. 57: pp. 439-454. 

6 Kruglyak, et. al.: Parametric and Nonparametric Linkage Analysis: A Unified Multipoint Approach. 
Am J Hum Genet , 1996, vol. 58, pp. 1347- 1363 f 

7 Spiehnan, R.S., McGinnis, R.E., Ewens, W.J.: Transmission Test for Linkage Disequilibrium: Hie 
Insulin Gene Region and Insulin-dependent Diabetes Mellitus(IDDM). Am J Hum Genet, 1993, vol. 52, 
pp. 506-516. 

8 Risch, N. and Merikangas, K.: The Future of Genetic Studies of Complex Human Diseases. Science, 
13 September 1996, vol. 273, pp. 1516-1517. 

9 Muller-Myshok, B. and Abel, L.: Technical Comments: The Future of Complex Diseases. Science, 28 
February 1997, vol. 275, pp. 1328-1329. 

10 Risch, N. and Merikangas, K.: Technical Comments: The Future of Complex Diseases. Science, 28 
February 1997, vol. 275, p. 1330. 



PCT/US99/04376(U.S^ational Stage Entry Aug. 2000) 



6 

1 published. 11 In this paper a general framework for determining the power of the TDT in many different 

2 situations is presented. The analysis of Risch and Merikangas 8 and others is shown by the inventor to 

3 be a special case of his general framework. His observations and calculations published in this paper 

4 have shown that the TDT has increased power in more common, less optimal situations as well as the 

5 less common, optimal situation cited by Muller-Myshok and Abel 9 . As opposed to the observation of 

6 Muller-Myhsok and Abel, the inventor's calculations indicate that association tests such as the TDT 

7 have increased power in typical situations even when the ratio m/p departs significantly from unity and, 

8 or the linkage disequilibrium between the analyzed (marker) allele and disease polymorphism is only 

9 half its maximum possible value. The inventor arrived at these conclusions independently and did not 

10 derive them from others. 

11 A Major Conclusion Drawn by the Inventor about the TDT and Linkage Studies: Using Bi-allelic 

12 Markers of Systematically Varying Allele Frequencies Increases the Power of Linkage Studies 

13 Using the TDT 

14 The inventor's calculations and observations about the increased power of the TDT in more common, 

15 less optimal situations led him to the conclusion that the power of linkage studies using the TDT is 

16 greatly increased under some conditions. Under some conditions, the power of the TDT in a linkage 

17 study using bi-allelic markers is greatly increased when each of one or more of the bi-allelic markers 

18 used in the study fulfill two criteria: (1) the allele frequencies of each of the one or more of the bi-allelic 

19 markers are similar (but not necessarily the same, or even approximately the same) as the allele 

20 frequencies of an unknown bi-allelic gene causing a disease under study; and (2) each of the one or 

21 more bi-allelic markers is in some degree of linkage disequilibrium with the gene. Thus for a typical 

22 linkage study using bi-allelic markers and the TDT, to increase the likelihood of conditions occurring 

23 that increase the power of the TDT in the linkage study, the bi-allelic markers used in the study are 

24 chosen so that the least common allele frequencies of the markers vary systematically over a range or 

25 subrange of least common allele frequency. This major conclusion of the inventor's research is quoted 

26 directly from his unpublished manuscript that was included with previously filed U.S. Provisional Patent 

27 Applications: "This example is typical and highlights perhaps the most important finding of this paper; 

28 namely the importance of using bi-allelic markers with heterozygosity similar to that of a bi-allelic 

29 disease gene. Indeed, since a majority of susceptibility loci may be bi-allelic, the judicious use of bi- 

30 allelic markers of both high, medium and low heterozygosity may be crucial in order to detect and 

31 replicate linkages to loci conferring modest disease risk." (page 25) (In this context the phrase "bi-allelic 

32 markers with heterozygosity similar to that of a bi-allelic disease gene" is essentially equivalent to °bi- 

33 allelic markers with individual allele frequencies similar to those of a bi-allelic disease gene" and "bi- 

34 allelic markers of both high, medium and low heterozygosity " is essentially equivalent to the phrase "bi- 

35 allelic markers whose least common individual allele frequencies are high, medium and low".) 

36 Systematically Varying Both Marker Chromosomal Location and Marker Allele Frequency of Markers in 

37 Linkage Studies 



11 McGinnis, R.E.: Hidden Linkage: Comparison of the affected sib pair (ASP) test and transmission 
disequilibrium test (TDT). Annals of Human Genetics, 1998, vol. 62, pp. 159-179. 



PCT/US99/04376(U Rational Stage Entry Aug. 2000) 



7 

1 The inventor's calculations and observations have demonstrated the increased power of the TDT in 

2 more common, less optimal situations when a bi-allelic marker and bi-allelic gene have (1) similar but 

3 not identical allele frequencies and (2) the marker and gene are in some degree of linkage 

4 disequilibrium. Thus, for a typical linkage study using bi- allelic markers and the TDT, to increase the 

5 likelihood of both criteria (1) and (2) occurring for one or more markers, so as to increase the power of 

6 the TDT in the linkage study, the bi-allelic markers used in the study are chosen so that the least 
1 common allele frequencies of the markers vary systematically over a range or subrange of least 

8 common allele frequency AND the chromosomal location of the markers vary systematically over one or 

9 more chromosomes or chromosomal regions. And the bi-allelic markers are chosen so that the 

10 markers' chromosomal locations and least common allele freouencies vary systematically in an 

11 essentially independent manner. 

12 Two-dimensional Linkage Study Techniques 

13 As has been stated, conventional linkage study scanning techniques use markers that are distributed 

14 approximately evenly in the dimension of chromosomal location. These conventional, one dimensional, 

15 scanning techniques focus primarily on the chromosomal location of markers used in a scan and give 

16 little attention to the dimension of allele frequency. 1 2 3 

17 One of the main implications of the inventor's work is to use a set of bi-allelic markers for a typical 

18 linkage study using the TDT (or other association-based linkage test) wherein the chromosomal 

19 locations and least common allele frequencies of the markers in the set systematically vary in an 

20 essentially independent manner over the dimensions of chromosomal location and least common allele 

21 frequency respectively. This is equivalent to using a set of bi-allelic markers for a linkage study scan 

22 wherein the set of markers systematically scan or "cover" a two-dimensional region having dimensions 

23 of chromosomal location and least common allele frequency. (Such a two-dimensional region can be 

24 thought of as an area in an x-y plot or a group of squares on a chessboard.) 

25 In addition, the inventor's calculations and observations indicate that bi-allelic markers having least 

26 common allele frequencies less than 0.3, 0.2 or even less than 0.1 have an important place in linkage 

27 studies using association based linkage tests. This is markedly different than Kruglyak's information 

28 content evaluation of bi-allelic markers for use in linkage studies, in which bi-allelic markers with least 

29 common allele frequencies less than 0.3 or 0.2 are viewed unfavorably. 4 

30 Conventional techniques use a one-dimensional concept of "closeness". These techniques 

3 1 space markers about evenly along a chromosome in the hope that some markers will be "close" 

32 (on the chromosome) to the sought gene. (They also favor bi-allelic markers with least common 

33 allele frequencies near 0.5.) These conventional techniques suffer from a kind of one 

34 dimensional view or lack of depth perception. In the conventional techniques, a marker can look 

35 very close to a gene's location in terms of chromosomal location, but the marker can be very far 

36 from the gene's location in the new two-dimensional view used by versions of the invention. 

37 It is as if the conventi nal 1D techniqu si ok at a chessboard fr m on dge. Markers and a 

38 gene which are on different squares of the b ard, but in the same column of squares, lo kv ry 

39 close to each other when the board is I oked at from on edg . But when the b ard is looked at 



PCTAJS99/04376(U.S^ational Stage Entry Aug. 2000) 



8 

1 from the top in 2D, tw dim nsions, markers which looked very close to ach other and th 

2 gene before (when looking from on edge) can be seen to be very far from the gene. 

3 Further Implications of the Two-dimensional Linkage Study Perspective 

4 These two-dimensional techniques work when multiple genes cause a genetic characteristic and are 

5 effective in searching for these genes. A two-dimensional bi-allelic marker "covering 0 or scanning 

6 approach also increases the power of linkage studies using other association based linkage tests such 

7 as the AFBACmethod, the haplotype relative risk (HRR) method 12 , and comparison of marker allele 

8 frequencies in disease cases and unrelated controls 13 . These references 12, 13 are not admitted to being 

9 prior art with respect to the present invention by their mention in this background.) 

10 Patents That May Be Helpful In Starting A Search Of The Background 

11 Some patents that are in the same general areas as versions of the invention are cited here: US Patent 

12 Number 5,667,976 Solid supports for nucleic acid hybridization assays. Published International 

13 Application WO 98/20165 Biallelic Markers. Published International Application WO 98/07887 Methods 

14 for treating bipolar mood disorder associated with markers on chromosome 18 p. US Patent Number 

15 5,552,270 Methods of DNA sequencing by hybridization based on optimizing concentration of matrix- 

16 bound oligonucleotide and device for carrying out same. No patent in this paragraph is admitted to 

17 being prior art with respect to the present invention by it's mention in this background. 

18 Summary 

19 Versions of the invention use a new, two-dimensional concept of "closeness" for association- 

20 based linkage studies. Versions of the invention use bi-allelic markers that "cover or are distributed 

21 approximately evenly (or systematically) over two-dimensional regions. These regions have the two 

22 dimensions of chromosomal location and least common allele frequency. 

23 Conventional techniques suffer from a kind of one-dimensional lack of depth perception. (They 

24 also favor bi-allelic markers with least common allele frequencies near 0.5.) Two-dimensional linkage 

25 study techniques overcome this lack of depth perception. These two-dimensional techniques greatly 

26 increase the chance that one or more markers used in a study will be close to the sought gene in two- 

27 dimensions. This results in more powerful, systematic and efficient methods (including computer 

28 programs) and machines for finding genes, such as harmful genes and genes of only modest effect. 

29 These techniques also use less dense (more efficient) marker maps (or marker "coverings"). 

30 The basic principles behind the two-dimensional approach spawn numerous other inventions. These 

31 include methods, machines and compositions of matter (groups of molecules) used for gathering the 

32 data (i.e. genotype/sample allele frequency data) used in the new two-dimensional studies, and 

33 computer techniques for using and handling such data. These techniques work for creatures other than 

34 human beings. And they work for markers and genes that are not bi-allelic (any marker or gene can be 

35 mathematically transformed to behave like it is bi-allelic). This summary is not exhaustive or limiting, 

36 there are other inventions not listed or specifically described here. 



12 Falk CT and Rubenstein P: Haplotype relative risks: an easy reliable way to construct a proper 
control sample for risk calculations. Annals of Human Genetics, 1987, vol. 51, pp. 227-233. 

13 Bell Gl, Horita S and Karam JH: A polymorphic locus near the human insulin gene is associated with 
insulin-dependent diabetes mellitus. Diabetes, 1984, vol 33, pp. 176-183. 



WO 99/43858 




PCT/US99/04376 



1 A CL-F matrix is a matrix of rectangular cells of the same length and the same width on a CL-F map. 

2 Stipulations that a certain number of covering markers are placed in each cell of the matrix is a method 

3 of illustrating particular types systematic covering of a CL-F region with covering mark rs. 

4 The evidence for linkage obtained from two-dimensional linkage studies is essentially two-dimensional 

5 in nature and it is possible to use this two-dimensional information by essentially graphing quantitative 

6 evidence for linkage as a function of position in the x-y plane. For example, if quantitative evidence for 

7 linkage is represented in the z dimension of a typical three-dimensional x-y-z piot, wherein the x and y 

8 dimensions are chromosomal location and least common allele frequency respectively, then it is 

9 possible to conceptualize evidence for linkage as occurring in a "hump" or "humps" in the z dimension. 

1 0 And it is possible to analyze the data to find the CL-F location (in the x-y plane) of the peak(s) of this 

1 1 "hump(s)", thus helping to localize a trait causing gene to the CL-F locale of the peak(s) of the 

12 u hump(s)'\ 

13 Versions of the invention also make use of multi-allelic genes and/or markers. It is always possible to 

14 combine the alleles of a multi-allelic polymorphism (marker or gene) so that the polymorphism acts 

15 mathematically like it is a bi-allelic polymorphism. In effect, it is always possible to mathematically 

16 transform a multi-allelic marker or gene to act bi-allelic. Similarly, two or more markers can always be 

17 mathematically combined to form a mathematical marker that acts like a single bi-allelic marker. And 

18 two or more genes can always be mathematically combined to form a mathematical gene that acts like 

19 a single bi-allelic gene. In this application a mathematical bi-allelic marker formed mathematically from 

20 one or more markers is called a bi-allelic marker equivalent or BME; and a mathematical bi-allelic gene 

21 formed mathematically from one or more genes is called a bi-allelic gene equivalent or BGE. 

22 The term true marker or gene is used to distinguish a marker or gene in the ordinary sense from a bi- 

23 allelic marker equivalent (BME) or bi-allelic gene equivalent (BGE). The term true allele is used to 

24 distinguish an allele in the ordinary sense from a mathematical allele of a BME or BGE. A mathematical 

25 allele of a BME or BGE is referred to as an allele equivalent. An allele equivalent is a combination of 

26 one or more true alleles or one or more hapiotypes. 

27 Versions of the invention make use of genes and/or markers, which are not exactly bi-allelic. These 

28 genes or markers are approximately bi-allelic. A gene or marker that is approximately bi-allelic almost 

29 always occurs in one of two allele forms, however, very rarely it occurs in a different allele form. 

30 Various versions of the invention are for genotyping individuals at markers which systematically cover 

3 1 CL-F regions or for obtaining sample allele frequency data (such as from pooled DNA) for a sample of 

32 individuals for markers which systematically cover CL-F regions. Various versions of the invention are 

33 for oligonucleotides used for genotyping individuals at markers which systematically cover CL-F regions 

34 or are for obtaining sample allele frequency data (such as from pooled DNA) for a sample of individuals 

35 for markers which systematically cover CL-F regions. 
36 

37 D finitions 

38 

39 For the purposes of the description and claims the terms used herein will have their generally accepted 

40 definition unless otherwise specified. 



SUBSTITUTE SHEET (RULE 26) 



WO 99/43858 



PCT/US99/04376 



15 

1 If a CL-F region is said t c mprise an area f great r than or qual to X multiplied by Y, then the 

2 CL-F region comprises one or more nonoverlapping segment-subranges, and the sum of the areas of 

3 the segment-subranges is greater than or equal to X multiplied by Y. 

4 A CL-F matrix is a collection of segment-subranges, wherein each segment-subrange of the collection 

5 has the same width and the same length. Each segment-subrange in the collection (or the matrix) is a 

6 CL-F matrix cell. Any one CL-F matrix cell in a CL-F matrix shares two or more of the cell's borders 

7 with two or more other cells in the matrix. And all the cells in a CL-F matrix together form a single 

8 segment-subrange. A CL-F matrix is characterized by the length and the width of the cells in the 

9 denoted by length x width, or L M cxW MC , wherein L MC is the length of each cell in the matrix and W MC is 

10 the width of each cell in the matrix. A CL-F matrix is also characterized by the number of rows of cells, 

11 R M , in the matrix. And a CL-F matrix is characterized by the number of columns of cells, C M , in the 

12 matrix. There are two or more cells in a CL-F matrix. A CL-F matrix is also characterized by the point of 

13 origin of the matrix, denoted by (cl 0 , f 0 ). The point of origin of a CL-F matrix is at any chromosomal 

14 location and cl 0 takes on any reasonable value in an entire species genome. The point of origin of a 

15 CL-F matrix is at any one value in the least common allele frequency range 0 to 0.5. (A CL-F matrix is 

1 6 similar to the squares of a chessboard or to equal rectangular floor tiles that are all oriented in the same 

1 7 direction and cover a rectangular floor. One corner of the matrix is the matrix's point of origin. ) 

1 8 The width of each cell of a particular CL-F matrix is any value greater than zero and less than 0.5. 

19 The width of a cell is often denoted by W M c 

20 Any length in chromosomal location distance units is chosen for the length of each cell of a particular 

21 CL-F matrix. The length of a cell is often denoted by L MC . 

22 The centerpoint of a CL-F matrix cell is in the center of the cell. The centerpoints of a CL-F matrix form 

23 a matrix centerpoint lattice. Each point of a matrix centerpoint lattice is separated by a CL-F distance 

24 of [0, W MC ] or [L MC) 0] from two or more neighboring centerpoints. 

25 If one or more bi -allelic markers are in(or within) the segment-subrange that is a CL-F matrix 

26 cell, then each of the markers is in or within the CL-F matrix cell. 

27 If one or more CL-F points is in (or within) a CL-F matrix, then each of the points is in or within a cell 

28 of the matrix. 

29 If a CL-F region comprises a CL-F matrix, then each point that is in the matrix is also in the region. 

30 If a CL-F region is a CL-F matrix, then the region consists of the points that are in the matrix. 

31 If two CL-F matrix cells share a common border, then the two CL-F matrix cells are in contact. 

32 If two CL-F matrix ceils share a common corner, then the two CL-F matrix cells are touching. (Two 

33 cells that are in contact are also touching.) 

34 If a group of CL-F points is connected to within a CL-F distance [X,Y], then for any two points in 

35 the group, denoted p, and p R , there is an ordered sequence of points in the group denoted p 1t p 2l 

36 P3i Pr-2. Pr-i, Pr . R being an integer greater than or equal to 2, wherein the CL-F distance between 

37 each point in the s quence and the next point in the sequence is less than or equal to [X,Y]. The 

38 distanc pc f Y] is the conn cting distance. (Put in simple terms if a group of points is connected to 

39 within [X,Y], th n th re is a path between each pair of points in the group, the path consisting of a 

40 series of steps, wherein each step in the path is a movement between two points in the group that are 



SUBSTITUTE SHEET (RULE 26) 



WO 99/43858 



PCT/US99/04376 



1 



16 

separated by a CL-F distance of less than or equal to [X,Y]. A simple group of points connected to 

2 within a CL-F distance of [X, Y] is a group of three points, wherein each point in the group is within a CL- 

3 F distance of less than or equal to [X, Y] of another point in the group. The concept of connectivity 

4 introduced here is similar to the basic concept of connectivity in mathematical graph theory.) 

5 If a group of N markers is connected to within a CL-F distance [X,Y], wherein N is an integer, then 

6 each of the markers is located at one point of group of N points, the group of N points being connected 

7 to within a CL-F distance [X,Y]. 

8 If two bi-allelic markers are said to be in extreme positive disequilibrium then d is approximately 

9 equal to d max for the two markers, which for the purposes of this definition are designated marker M 

10 with least common allele A and marker m with least common allele B. Wherein according to standard 

1 1 usage, the disequilibrium coefficient (d) is defined by the equation d=f(AB) - f(A)f(B) where f(A) and f(B) 

12 are defined as the population frequencies of alleles A and B, respectively, and f(AB) is the population 

13 frequency of the AB haplotype. And d ma x »s defined as the maximum possible positive value of d 

14 assuming the allele frequencies of A and B are f(A) and f(B), and thus d ma x= q-f(A)f(B) where q is the 

15 lesser of f(A) and f(B). (In this application d is used to represent the disequilibrium coefficient; the 

16 symbol 6 is often used in scientific papers to represent the disequilibrium coefficient.) 

17 If a pair of markers is said to be in extreme positive disequilibrium, then the two markers of the 

18 pair are in extreme positive disequilibrium. 

19 If a pair of bi-allelic markers is said to be redundant within distance D then the two markers of the 

20 pair are in extreme positive disequilibrium and the two markers are located on the same chromosome 

21 and the two markers are located within a CL-F distance D of each other on a CL-F map, wherein D is a 

22 specified distance and D has two components, a chromosomal location distance component D CL and a 

23 frequency distance component, D F ; D = [D CL , D F ]. 

24 An allele equivalent (AE) is a group of one or more "haplotype values" of one or more polymorphisms 

25 of the same type, either markers or genes. ( For the purposes of this application a haplotype value of 

26 one polymorphism is equivalent to an allele value at the one polymorphism. )The group of haplotype 

27 values is then analyzed as if the group is a single allele at a bi-allelic polymorphism; the group of 

28 haplotype values acts as a single allele at a bi-allelic polymorphism; the collection of the one or more 

29 polymorphisms upon which the haplotype values are based acts as a bi-allelic polymorphism; the 

30 collection of one or more polymorphisms forms a bi-allelic polymorphism equivalent (PE) that acts as 

3 1 a bi-allelic polymorphism; the polymorphism equivalent has(or possesses) the allele equivalent. 

32 The allele equivalent belongs to the polymorphism equivalent. In this application, each polymorphism 

33 equivalent is a bi-allelic marker equivalent(BME) or a bi-allelic gene equivalent(BGE). 

34 A bi-allelic marker equivalent (BME) is one or more markers and a grouping of the haplotype values 

35 of the one or more markers into two groups (e.g. group I and group ll)(For the purposes of this 

36 application a "haplotype value" of one marker is equivalent to an allele at the one marker). The one or 

37 more markers and the two groups of haplotype values of the on or more markers are then analyzed as 

38 if the one or more markers are a single bi-allelic marker with alleles I and II. Each group of the groups I 

39 and II is an allele equivalent. For example, a multi-allelic microsatellite marker has it's multiple alleles 

40 grouped into two groups and the microsatellite marker and these two groups of alleles then act 



SUBSTITUTE SHEET (RULE 25) 



WO 99/43858 



PCT/US99/04376 



33 

1 details regarding this, see Detailed Description of the Systematic Covering of a CL-F Region Used In 

2 Versions of the Invention above. 

3 An example of ProcessGd/Safd#1 Genotype data/Sample allele frequency data process, a 

4 genotype data process: 

5 Example 1 of ProcessGd/Safd#1 :A process for obtaining genotype data/sample allele frequency data 

6 for each bi-allelic marker of a group of two or more bi-allelic covering markers in the chromosomal DNA 

7 of an individual, wherein the genotype data/sample allele frequency data is genotype data, comprising: 

8 a) means for determining information on the presence or absence of each allele of each bi-allelic 

9 marker of a group of two or more bi-allelic covering markers in the chromosomal DNA from an 

10 individual, a CL-F region being N covered to within the CL-F distance [12cM, 0.25] or the equivalent 

1 1 thereof by the two or more bi-allelic covering markers; wherein N is an integer number greater than or 

12 equal to 1; and 

13 b) means for transforming the information of step a) into genotype data for each marker of the group. 

14 (It should be noted that the following genotype process is equivalent to Example 1 of 

15 ProcessGd/Safd#1 : Genotype Process: A process for genotyping an individual, comprising: 

16 a)means to genotype an individual at two or more bi-allelic covering markers, a CL-F region being N 

17 covered to within the CL-F distance [12cM, 0.25] or the equivalent thereof by the two or more bi-all lie 

18 covering markers, wherein N is an integer number greater than or equal to 1 . ) 
19 

20 Oligonucleotide technology 

21 Each version of oligonucleotide technology is a means to sense the presence or absence of each of 

22 one or more true alleles of a group of true alleles in chromosomal DNA from one or more individuals by 

23 means of a hybridization reaction with an oligonucleotide that is complementary to each of the one or 

24 more true alleles (see definitions section). Thus versions of oligonucleotide technology are a means of 

25 genotyping one or more individuals. And, versions of oligonucleotide technology are a means of 

26 obtaining sample allele frequency data for one or more marker alleles for a sample of individuals using 

27 pooled DNA from the individuals in the sample. 

28 In Some Versions of Oligonucleotide Technology for Genotyping or Obtaining Sample Allele Frequency 

29 Data, a Physico-chemical Signal is Generated when an Allele in Chromosomal DNA and a 

30 Complementary Oligonucleotide Hybridize 

3 1 Some versions of oligonucleotide technology for genotyping or for obtaining sample allele frequency 

32 data use a sensor which includes one or more oligonucleotides which are complementary to an allele. 

33 When the s nsor is exposed to chromosomal DNA from an individual who carri s the allele, the 

34 oligonud otid s which are compl m ntary to th all le hybridiz with chromosomal DNA specimens of 

35 the all le. The hybridization generates a physico-chemical signal which indicates the presence of the 



SUBSTITUTE SHEET (RULE 26) 



WO 99/43858 



PCTVUS99/04376 



34 

1 allele in the chromosomal DNA of the individual. The lack of the physico-chemical signal indicates no 

2 (or negiigible)hybridization and that the allele is not present in the chromosomal DNA of an individual. 

3 Examples of oligonucleotide technology for qenotvpino. obtaining sample allele frequency data or 

4 genotype data/sample allele freguencv data 

5 Companies like Affymetrix are using high density arrays of oligonucleotides attached to silicon chips or 

6 glass slides to genotype DNA from one individual at thousands of bi-allelic markers/" 1 In some of these 

7 versions of oiigonucleotide technology, the strength of hybridization of oligonucleotides that differ at 

8 only one base to DNA containing an SNPare compared to determine genotype. 1X Another version of 

9 oligonucleotide technology uses oligonucleotides as PCR (Polymerase Chain Reaction) primers to 

10 obtain genotype data x Other examples of oligonucleotide technology and it's uses to obtain genetic 

XI 

1 1 information are included in the articles cited in the endnotes. Versions of oligonucleotide technology 

12 obtain sample allele frequency data from pooled DNA or genotype data using oligonucleotides as PCR 

13 primers to obtain amplified reaction products that are detected by mass spectrometry. Another example 

14 of oligonucleotide technology is padlock probes XM 

1 5 Other examples of oligonucleotide technology are minisequencing on DNA arrays, dynamic allele- 

16 specific hydridization, microplate array diagonal gel electrophoresis, pyrosequencing, oligonucleotide- 

17 specific ligation, the TaqMan system and immobilized padlock probes as presented at the First 

18 International Meeting on Single Nucleotide Polymorphism and Complex Genome Analysis xm 

19 Sets of Oligonucleotides for Genotypino at Bi-allelic Markers or Obtaining Sample Allele Freguencv 

20 Data 

2 1 A set of oligonucleotides that is complementaryfsee definitions) to a group of one or more bi-allelic 

22 markers has utility to determine genotype data at each of the markers in the group, including groups 

23 with BMEs and approximately bi-allelic markers. 

24 Similarly, a set of oligonucleotides that is complementary to a group of bi-allelic markers has utility to 

25 obtain sample allele frequency data for each allele of each marker in the group. 

26 in both cases, obtaining genotype data or sampie aiiele frequency data, the same principie is 

27 used: a set of oligonucleotides that is complementary to a group of bi-allelic markers has utility 

28 determine the presence or absence of each aiiele of each marker in the group in chromosomal 

29 DNA. 

30 Using sets of oligonucleotides to obtain Genotype Data/Sample Allele Frequency Data for each 

31 marker of a group of bi-allelic markers, wherein the group of markers systematically cover a CL- 

32 F region 

33 Genotype data/sample allele frequency data for each marker of a group of bi-allelic markers, wherein 

34 the group of bi-allelic markers systemically cover a CL-F region has great utility for use in the more 

35 powerful two-dimensional linkage studies introduced in this application. As described above under 

36 Oligonucleotide Technology, some sets of oligonucleotides have utility to determine genotype data at 

37 each bi-allelic marker of a group of one or more bi-allelic markers. Similarly, some sets of 

38 oligonucleotid s have utility to obtain sample altel frequency data for each bi-allelic marker of a group 

39 of one or more bi-allelic markers. Therefore, the use of one or more copies of a set of oligonucleotides 

40 to obtain genotype data or sample allele frequency data for each bi-allelic marker of a group of one or 



SUBSTITUTE SHEET (RULE 26) 



PCT/US99/04376(EntiiIXS. National Stage Aug. 2000) 



it^^. 



38 

1 Versions of the apparatus comprise means for printing each of the one or more graphs. 

2 The rv of Operation / Set/Subset Example 

3 Systematically Varying Both Marker Chromosomal Location and Marker Allele Frequency of Markers in 

4 Linkage Studies 

5 The inventor's calculations and observations have demonstrated the increased power of the TDT in 

6 more common, less optimal situations when a bi-allelic marker and bi-allelic gene have (1) similar but 

7 not identical allele frequencies and (2) the marker and gene are in some degree of linkage 

8 disequilibrium. Thus, for a typical linkage study using bi- allelic markers and an association based 

9 linkage test, to increase the likelihood of both criteria (1) and (2) occurring for one or more 

10 markers, so as to increase the power of an association based linkage test in a linkage study, the 

1 1 bi-allelic markers used in the study are chosen so that the least common allele frequencies of 

12 the markers vary systematically over a range or subrange of least common allele frequency 

13 AND the chromosomal location of the markers vary systematically over one or more 

14 chromosomes or chromosomal regions. And the bi-allelic markers are chosen so that the 

1 5 markers' chromosomal locations and least common allele frequencies vary systematically in an 

16 essentially independent manner. 

17 (In theTheory of Operation/ Set/Subset Example Section the traditional symbol used in scientific papers 

18 for the disequilibrium coefficient, 5, is used. This should not be confused with the symbol 5 used for the 

19 covering distance in the remainder of the application. The symbol d is used for the disequilibrium 

20 coefficient in the sections of the application other than the Theory of Operation/Set/Subset Example 

21 Section.) The theory of operation is based on the mathematical observation that the TDT and other 

22 association-based tests for linkage are increased in power as the frequencies of the disease-causing 

23 allele of a bi-allelic gene and the positively associated allele of a linked bi-allelic marker become similar 

24 in magnitude. The inventor made this observation as a result of deriving the equation shown below for 

25 P t (this is Equation 2 in the unpublished manuscript submitted for publication in December 1996 and in 



26 published paper by RE McGinnis in the Annals of Human Genetics vol 62, pp. 159-179, 1998). 



27 P t = .5 + (1 - 26) 



c l c 4 C 2 C 3 
H 



p 2 



+ 2p(l-p) 



{g+fi) 2 -(M+rr 

16 



+ Q.-P) 2 



28 Equation 2 

29 

30 P t may be regarded as the size of the "signal" which is given by the TDT to indicate that a tested 

31 marker is linked to a disease-causing gene. The more P t is elevated above 0.5 (baseline), the greater is 

32 the evidence for linkage or "power" provided by the association-based linkage test known as the TDT. 

33 Table 2 in the unpublished manuscript filed with previous US Provisional Patent 

34 Applications(see below) illustrates how signal strength increases substantially as the frequencies of 

35 disease-causing allele and positively associated marker allele become similar in magnitude. As noted 

36 on pages 24 and 25 of the unpublished manuscript(see below), Table 2 assumes that the frequency (p) 



WO 99/43858 



PCT/US99/04376 



39 

1 of the disease-causing all le is fixed at p=.1 while the frequency (m) of the positively associated mark r 

2 allele varies (m=.5, .3 ( .2, .1, .05). Note that when the level of disequilibrium (or association) between 

1 

3 the bi-ailelic marker and bi-allelic disease gene is fixed (in this case either 8=5 Tna x or $=9" Smax ). the 

4 signal strength of P t progressively increases as m decreases from m= 5 to m=.1 (the same frequency 

1 

5 as the disease allele, i.e., p= 1 ). For example, in the section of Table 2 for r=5, note that when 8=2 

6 5 ma x> p t is 548 at m= * 5 and then steadily increases to .572 (m=3), .597 (m=.2), .648 (m= 1 ) and then 

7 starts to decrease again as m departs from m=p=. 1 (i.e. P t =.636 at m= 05). As noted on pages 24-25 

8 (below)of the unpublished manuscript, the TDT chi-square statistic (assuming a sample size of 200 

9 families) is such that the signal strength at m= 5 (P t =.548) does not produce a statistically significant 

10 evidence for linkage (p-value > 0.5) while the doubling of signal strength at m=.2 (P t =.597) produces 

1 1 very strong statistical evidence for linkage by the TDT (p-value< 0.005). This sort of substantial 

12 increase in power is also true of other association-based linkage tests as the frequencies of the 

13 disease-causing allele and associated marker allele become more similar in magnitude. 

14 



SUBSTITUTE SHEET (RULE 26) 



PCT/US99/04376(Ent^^S. National Stage Aug. 2000) 

43 

1 judicious use of bi-allelic markers of both high, medium, and low heterozygosity may be 

2 crucial in order to initially detect and replicate linkages to loci conferring modest disease risk. 

3 

4 Set/Subset Example: 

5 Method for locating disease causing polymorphism using biallelic linkage 

6 analysis 

7 
8 

9 Objective :To test, by association-based linkage analysis (e.g., by TDT), whether a 

10 disease-causing polymorphism is located on a particular chromosome (e.g., human 

1 1 chromosome 4) or within a particular subregion of that chromosome. 
12 

13 

14 PART 1 - Steps in conducting the association-based linkage test 
15 

16 Step 1 

17 To conduct the test, first divide the chromosome or subregion of interest into segments 

18 that are short enough that polymorphisms within each segment are likely to be in linkage 

19 disequilibrium with each other. The division of a chromosome or subregion of interest into 

20 "segments" is conceptual {not physical) and is based on chromosomal maps such as those 

21 provided by the Whitehead Institute or Marshfield Foundation for Biomedical Research. 

22 Although disequilibrium has been observed in Finnish populations between polymorphisms 

23 that are 7 to 10 centimorgans (cM) apart, the chromosomal segments for searching for disease- 

24 causing polymorphisms in more genetically heterogeneous populations should be less than 1 

25 cM long (e.g., 250,000 base pairs long). These chromosomal segments might or might not 

26 overlap each other (i.e., share some of their length in common); but the set of chromosomal 

27 segments should completely cover the entire chromosome or entire subregion of interest, so 

28 that a disease-causing polymorphism located anywhere on the chromosome or anywhere in the 

29 subregion of interest will be detected by the test. 
30 

31 Step 2 

32 It is well known that increased disequilibrium between a marker and linked disease 

33 locus increases evidence for linkage provided by association-based linkage tests such as the 

34 TDT. However, what has not been recognized is that the specific allele frequencies of the 

35 marker locus can also have an enormous impact on the strength of evidence for linkage. I 



PCT/US99/04376(Entr^J.S. National Stage Aug. 2000) 

46 

1 the nearly identical information with respect to their linkage and association with a third 

2 polymorphism such as a disease locus. Hence one of the two bi-allelic markers would provide 

3 no additional information and its inclusion in the subset would not increase the likelihood of 

4 detecting linkage and association to a nearby disease locus. 

5 Therefore, bi-allelic markers belonging to the same chromosomal segment and subset 

6 should not only have similar allele frequencies, the 8 value between each pair of bi-allelic 

7 markers in the same subset should be substantially less than Smax^ q-q^ This assures that 

8 every bi-allelic polymorphism belonging to the subset provides much new (i.e. non-redundant) 

9 information about linkage and association to any nearby bi-allelic disease locus; thus testing 

10 each bi-allelic marker in the subset would increase the likelihood of detecting linkage to a 

1 1 disease locus. 
12 

13 Step4: Test for linkage 

14 To test for (association-based) linkage to a bi-allelic disease locus, each bi-allelic 

15 marker in each subset from each chromosomal segment is tested individually by using the 

16 TDT, AFB AC method or other family-based linkage test. To conduct these tests for a 

17 particular marker, members of nuclear families (most especially parents, and any children who 

18 manifest disease) are genotyped at the marker being tested and the genotypes are then 

19 evaluated according to the TDT, AFBAC method or other family-based linkage/association test 

20 (for description of TDT and AFBAC, see Spielman et al, Am J of Human Genetics 52:506-516 

21 (1993) and Thomson, Am J Human Genetics 57:487-498 (1995)). Alternatively, linkage and 

22 association is tested for each marker in each subset from each segment by genotyping 

23 individuals with disease and related or unrelated normal controls at each marker to be tested, 

24 (End of set/subset example) 

25 Further Information 

26 (Step 3 is not essential for the operation or utility of this version of the invention. In this 

27 set/subset example, the least common allele frequency subrange 0.1 to 0.5 is used. In versions 

28 of the invention similar to the set/subset example, versions of the invention are operable and 

29 have utility for any subrange of the least common allele frequency range 0 to 0.5. In addition, 

30 rather than genotyping DNA from single individuals in step 4, in some versions of the 

31 invention each marker in each subset from each segment is tested for association with disease 

32 by evaluating DNA from pooled samples.) 
33 



PCT/US99/04376 

m 

74 

i Statement under Article 1 9(1 ) 

2 

3 PCT/US99/04376 
4 

5 Some of the amended claims make use of the phrase "conditional probability", such as claim 11. Some 

6 of the amended claims make use of the phrase "proportion of groups", such as claim 14. There are 

7 various techniques to calculate or estimate such a probability or such a proportion. These techniques 

8 include, but are not necessarily limited to, direct calculation, statistical estimates, and Monte Carlo 

9 estimation techniques. Powerful software is available for calculation and statistical estimation for data in 

10 matrix format or two-dimensional format Some such software is available from Cytel Software 

11 Corporation, Cambridge, Massachusetts ( example: Exact Logistic Regression: Theor y and Examples. 

12 Mehta CR, Patel NR, Statistics in Medicine, vol 14, 2143-2160(1995). Another example is SAS (SAS 

13 Institute Inc., SAS Campus Drive, Cary, North Carolina 27513, USA.; A handbook of statistic al analyses 

14 using SAS by Brian S. Everitt and Geoff Der, Boca Raton, Fla. : Chapman & Hall/CRC, 1998.). A further 

15 example is MATLAB (The MathWorks, Inc. 3 Apple Hill Drive, Natick, Mass. U.S.A. 01 760-2098; 

16 MATLAB primer by Kermit Sigmon, 4th ed. Boca Raton : CRC Press, c1994.) Statistical techniques 

17 include techniques for hypothesis testing, goodness-of-fit and others. 
18 

19 The degree of skill in the art in probability and statistics is great. Indeed the inventor's important 

20 equation (Equation 2, page 38) is an equation for P t , wherein P t is a binomial probability for parental 

21 allele transmission 1 which determines the magnitude of the TDT chi-square statistic. P s (pages 40-42) 

22 is also a binomial probability that determines the magnitude of the ASP test statistic, (see Abstract and 

23 Paper: Annals of Human Genetics (1998), 62, 159-179. The abstract is available on the World Wide 

24 Web and Internet, including at the journal's website.) Skill in the use of computers in the art is also great 

25 (page 25). 
26 

27 Some claims, such as claims 1 1 , 1 2, 1 3, 1 4 and others make use of the phrase "substantially the 

28 known set of bi-allelic markers". As pointed out in the description (page 25) information on bi-allelic 

29 markers can be gained from sources such as the Whitehead Institute or Marshfield Foundation for 

30 Biomedical Research. Similar sources of information on Single Nucleotide Polymorphisms can be 

31 obtained from sources given in SNP attack on complex traits . Nature Genetics, volume 20 no. 3, Nov 

32 1998, pp. 217-218. 
33 

34 Some claims, such as claims 11, 12, 13, 14 and others make use of the term "marker type" or similar 

35 terminology. As stated in the description, a bi-allelic marker may be an SNP, a microsatellite marker, a 

36 bi-allelic marker equivalent formed from one or more true bi-alleic markers. "Marker type" means type of 

37 true bi-allelic marker as for example an SNP or a microsatellite; or "marker type" means a bi-allelic 

38 marker equivalent of a certain type, such as a bi-allelic marker equivalent formed only from one or more 

39 SNPs or a bi-allelic marker equivalent formed only from one or more microsatellites.) 
40 

41 



